Portable and fault-tolerant software systems
نویسنده
چکیده
High availability is essential to heterogeneous computer networks, which are the basis of many systems ranging from the Internet to fly-by-wire flight controls. Development of highly available systems, however, is constrained by ever shorter times to market and the availability of off-the-shelf hardware and software (see the “Examples” box). Consequently, the economic necessity of using commodity products from different vendors puts a premium on the products’ fault tolerance. The development of fault-tolerant and portable software, particularly for parallel and distributed systems consisting of networks of binary-incompatible machines, continues to challenge engineers. In this article, I describe a new approach to developing fault-tolerant software. This approach has been validated by a prototype compiler developed by me and my MIT colleagues as part of ongoing research. Our primary goal is to develop source-to-source compiler technology that simplifies the process of adding fault tolerance to a computation. The programmer precompiles a program before generating an executable with a native compiler. The precompilation automatically generates code to save and recover from portable checkpoints, which capture the state of a computation in a machine-independent format. Portable checkpoints can be saved in a file or replicated on other machines in a network, and can be used to restore the computation on a binary-incompatible machine. We assume that the source program is likely to be correct, independently of whether it incorporates software reliability. Checkpointing a computation enables a restart on another machine in case of failure, regardless of whether the hardware or another software module causes the failure.
منابع مشابه
Voting Algorithm Based on Adaptive Neuro Fuzzy Inference System for Fault Tolerant Systems
some applications are critical and must designed Fault Tolerant System. Usually Voting Algorithm is one of the principle elements of a Fault Tolerant System. Two kinds of voting algorithm are used in most applications, they are majority voting algorithm and weighted average algorithm these algorithms have some problems. Majority confronts with the problem of threshold limits and voter of weight...
متن کاملVoting Algorithm Based on Adaptive Neuro Fuzzy Inference System for Fault Tolerant Systems
some applications are critical and must designed Fault Tolerant System. Usually Voting Algorithm is one of the principle elements of a Fault Tolerant System. Two kinds of voting algorithm are used in most applications, they are majority voting algorithm and weighted average algorithm these algorithms have some problems. Majority confronts with the problem of threshold limits and voter of weight...
متن کاملSecurity and fault tolerance pdf
Process groups are a common abstraction for fault-tolerant computing in distributed systems. We present a security architecture that extends.Abstract Concerns about both security and fault-tolerance have had an important. Tion of fault tolerance will face some of the same problems, and benefit from. The security testing prototype is.A Secure and Fault-tolerant framework for Mobile. Of Computer ...
متن کاملSoftware-based Fault-tolerant Clock Synchronization for Distributed Unix Environments Software-based Fault-tolerant Clock Synchronization for Distributed Unix Environments
| Fault-tolerant clock synchronization is often used in distributed systems with requirements such as close interaction between its components, measurements of elapsed time and ordering of events in the system. Diierent implementation approaches can be used to achieve fault-tolerant clock synchronization, depending on criteria such as performance, cost and availability of hardware and operating...
متن کاملExtending the Features of Software for Reliability Analysis of Fault-tolerant Systems
The developed software ASNA-2, which is an improved version of the software ASNA-1, is based on the technology of automated estimation of reliability indexes of fault-tolerant systems. This software is designed for automated evaluation of the reliability indexes of fault-tolerant hardware – software systems. This paper describes a software ASNA-2 with the peculiarities of procedures of reliabil...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Micro
دوره 18 شماره
صفحات -
تاریخ انتشار 1998